176 research outputs found
Finding Competitive Network Architectures Within a Day Using UCT
The design of neural network architectures for a new data set is a laborious
task which requires human deep learning expertise. In order to make deep
learning available for a broader audience, automated methods for finding a
neural network architecture are vital. Recently proposed methods can already
achieve human expert level performances. However, these methods have run times
of months or even years of GPU computing time, ignoring hardware constraints as
faced by many researchers and companies. We propose the use of Monte Carlo
planning in combination with two different UCT (upper confidence bound applied
to trees) derivations to search for network architectures. We adapt the UCT
algorithm to the needs of network architecture search by proposing two ways of
sharing information between different branches of the search tree. In an
empirical study we are able to demonstrate that this method is able to find
competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day.
Extending the search time to five GPU days, we are able to outperform human
architectures and our competitors which consider the same types of layers
Supervising the Multi-Fidelity Race of Hyperparameter Configurations
Multi-fidelity (gray-box) hyperparameter optimization techniques (HPO) have
recently emerged as a promising direction for tuning Deep Learning methods.
However, existing methods suffer from a sub-optimal allocation of the HPO
budget to the hyperparameter configurations. In this work, we introduce DyHPO,
a Bayesian Optimization method that learns to decide which hyperparameter
configuration to train further in a dynamic race among all feasible
configurations. We propose a new deep kernel for Gaussian Processes that embeds
the learning curve dynamics, and an acquisition function that incorporates
multi-budget information. We demonstrate the significant superiority of DyHPO
against state-of-the-art hyperparameter optimization methods through
large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and
diverse architectures (MLP, CNN/NAS, RNN).Comment: Accepted at NeurIPS 202
- …